当量化神经网络以进行有效推断时,低位整数是效率的首选格式。但是,低位浮点数具有额外的自由度,分配了一些以指数级的工作。本文深入研究了神经网络推断的浮点格式的这种好处。我们详细介绍了可以为FP8格式做出的选择,包括对Mantissa和Exponent的位数的重要选择,并通过分析显示这些选择可以提供更好的性能。然后,我们展示了这些发现如何转化为真实网络,为FP8模拟提供有效的实现,以及一种新算法,该算法能够学习比例参数和FP8格式中的指数位数。我们的主要结论是,在对各种网络进行培训后量化时,就准确性而言,FP8格式优于INT8,并且指数位数量的选择是由网络中异常值的严重性驱动的。我们还通过量化感知训练进行实验,在训练网络以降低离群值的效果时,格式的差异消失。
translated by 谷歌翻译
在本文中,我们介绍了一种新颖的神经网络重量压缩方法。在我们的方法中,我们将重量张量存储为稀疏,量化的矩阵因子,其产品在推理过程中即时计算以生成目标模型的权重。我们使用预计的梯度下降方法来找到重量张量的量化和稀疏分解。我们表明,这种方法可以看作是重量SVD,矢量量化和稀疏PCA的统一。结合端到端微调,我们的方法超出了或与以前的最先进方法相提并论,就精度和模型大小之间的权衡而言。我们的方法适用于中等压缩方案,与矢量量化和极端压缩方案不同。
translated by 谷歌翻译
t-SNE remains one of the most popular embedding techniques for visualizing high-dimensional data. Most standard packages of t-SNE, such as scikit-learn, use the Barnes-Hut t-SNE (BH t-SNE) algorithm for large datasets. However, existing CPU implementations of this algorithm are inefficient. In this work, we accelerate the BH t-SNE on CPUs via cache optimizations, SIMD, parallelizing sequential steps, and improving parallelization of multithreaded steps. Our implementation (Acc-t-SNE) is up to 261x and 4x faster than scikit-learn and the state-of-the-art BH t-SNE implementation from daal4py, respectively, on a 32-core Intel(R) Icelake cloud instance.
translated by 谷歌翻译
The ultimate goal of artificial intelligence is to mimic the human brain to perform decision-making and control directly from high-dimensional sensory input. All-optical diffractive neural networks provide a promising solution for realizing artificial intelligence with high-speed and low-power consumption. To date, most of the reported diffractive neural networks focus on single or multiple tasks that do not involve interaction with the environment, such as object recognition and image classification, while the networks that can perform decision-making and control, to our knowledge, have not been developed yet. Here, we propose to use deep reinforcement learning to realize diffractive neural networks that enable imitating the human-level capability of decision-making and control. Such networks allow for finding optimal control policies through interaction with the environment and can be readily realized with the dielectric metasurfaces. The superior performances of these networks are verified by engaging three types of classic games, Tic-Tac-Toe, Super Mario Bros., and Car Racing, and achieving the same or even higher levels comparable to human players. Our work represents a solid step of advancement in diffractive neural networks, which promises a fundamental shift from the target-driven control of a pre-designed state for simple recognition or classification tasks to the high-level sensory capability of artificial intelligence. It may find exciting applications in autonomous driving, intelligent robots, and intelligent manufacturing.
translated by 谷歌翻译
Often clickbait articles have a title that is phrased as a question or vague teaser that entices the user to click on the link and read the article to find the explanation. We developed a system that will automatically find the answer or explanation of the clickbait hook from the website text so that the user does not need to read through the text themselves. We fine-tune an extractive question and answering model (RoBERTa) and an abstractive one (T5), using data scraped from the 'StopClickbait' Facebook pages and Reddit's 'SavedYouAClick' subforum. We find that both extractive and abstractive models improve significantly after finetuning. We find that the extractive model performs slightly better according to ROUGE scores, while the abstractive one has a slight edge in terms of BERTscores.
translated by 谷歌翻译
Transformers have become the state-of-the-art neural network architecture across numerous domains of machine learning. This is partly due to their celebrated ability to transfer and to learn in-context based on few examples. Nevertheless, the mechanisms by which Transformers become in-context learners are not well understood and remain mostly an intuition. Here, we argue that training Transformers on auto-regressive tasks can be closely related to well-known gradient-based meta-learning formulations. We start by providing a simple weight construction that shows the equivalence of data transformations induced by 1) a single linear self-attention layer and by 2) gradient-descent (GD) on a regression loss. Motivated by that construction, we show empirically that when training self-attention-only Transformers on simple regression tasks either the models learned by GD and Transformers show great similarity or, remarkably, the weights found by optimization match the construction. Thus we show how trained Transformers implement gradient descent in their forward pass. This allows us, at least in the domain of regression problems, to mechanistically understand the inner workings of optimized Transformers that learn in-context. Furthermore, we identify how Transformers surpass plain gradient descent by an iterative curvature correction and learn linear models on deep data representations to solve non-linear regression tasks. Finally, we discuss intriguing parallels to a mechanism identified to be crucial for in-context learning termed induction-head (Olsson et al., 2022) and show how it could be understood as a specific case of in-context learning by gradient descent learning within Transformers.
translated by 谷歌翻译
Content scanning systems employ perceptual hashing algorithms to scan user content for illegal material, such as child pornography or terrorist recruitment flyers. Perceptual hashing algorithms help determine whether two images are visually similar while preserving the privacy of the input images. Several efforts from industry and academia propose to conduct content scanning on client devices such as smartphones due to the impending roll out of end-to-end encryption that will make server-side content scanning difficult. However, these proposals have met with strong criticism because of the potential for the technology to be misused and re-purposed. Our work informs this conversation by experimentally characterizing the potential for one type of misuse -- attackers manipulating the content scanning system to perform physical surveillance on target locations. Our contributions are threefold: (1) we offer a definition of physical surveillance in the context of client-side image scanning systems; (2) we experimentally characterize this risk and create a surveillance algorithm that achieves physical surveillance rates of >40% by poisoning 5% of the perceptual hash database; (3) we experimentally study the trade-off between the robustness of client-side image scanning systems and surveillance, showing that more robust detection of illegal material leads to increased potential for physical surveillance.
translated by 谷歌翻译
Generative adversarial networks are a promising tool for image generation in the astronomy domain. Of particular interest are conditional generative adversarial networks (cGANs), which allow you to divide images into several classes according to the value of some property of the image, and then specify the required class when generating new images. In the case of images from Imaging Atmospheric Cherenkov Telescopes (IACTs), an important property is the total brightness of all image pixels (image size), which is in direct correlation with the energy of primary particles. We used a cGAN technique to generate images similar to whose obtained in the TAIGA-IACT experiment. As a training set, we used a set of two-dimensional images generated using the TAIGA Monte Carlo simulation software. We artificiallly divided the training set into 10 classes, sorting images by size and defining the boundaries of the classes so that the same number of images fall into each class. These classes were used while training our network. The paper shows that for each class, the size distribution of the generated images is close to normal with the mean value located approximately in the middle of the corresponding class. We also show that for the generated images, the total image size distribution obtained by summing the distributions over all classes is close to the original distribution of the training set. The results obtained will be useful for more accurate generation of realistic synthetic images similar to the ones taken by IACTs.
translated by 谷歌翻译
Decentralized learning with private data is a central problem in machine learning. We propose a novel distillation-based decentralized learning technique that allows multiple agents with private non-iid data to learn from each other, without having to share their data, weights or weight updates. Our approach is communication efficient, utilizes an unlabeled public dataset and uses multiple auxiliary heads for each client, greatly improving training efficiency in the case of heterogeneous data. This approach allows individual models to preserve and enhance performance on their private tasks while also dramatically improving their performance on the global aggregated data distribution. We study the effects of data and model architecture heterogeneity and the impact of the underlying communication graph topology on learning efficiency and show that our agents can significantly improve their performance compared to learning in isolation.
translated by 谷歌翻译
This paper focuses on the uncertainty estimation of white matter lesions (WML) segmentation in magnetic resonance imaging (MRI). On one side, voxel-scale segmentation errors cause the erroneous delineation of the lesions; on the other side, lesion-scale detection errors lead to wrong lesion counts. Both of these factors are clinically relevant for the assessment of multiple sclerosis patients. This work aims to compare the ability of different voxel- and lesion- scale uncertainty measures to capture errors related to segmentation and lesion detection respectively. Our main contributions are (i) proposing new measures of lesion-scale uncertainty that do not utilise voxel-scale uncertainties; (ii) extending an error retention curves analysis framework for evaluation of lesion-scale uncertainty measures. Our results obtained on the multi-center testing set of 58 patients demonstrate that the proposed lesion-scale measures achieves the best performance among the analysed measures. All code implementations are provided at https://github.com/NataliiaMolch/MS_WML_uncs
translated by 谷歌翻译